Cross-lingual Interpolation of Speech Recognition Models
نویسندگان
چکیده
A method is proposed for implementing the cross-lingual porting of recognition models for rapid prototyping of speech recognisers in new target languages, specifically when the collection of large speech corpora for training would be economically questionable. The paper describes a way to build up a multilingual model which includes the phonetic structure of all the constituent languages, and which can be exploited to interpolate the recognition units of a different language. The CTSU (Classes of Transitory-Stationary Units) approach is exploited to derive a well balanced set of recognition models, as a reasonable trade-off between precision and trainability. The phonemes of the untrained language are then mapped onto the multilingual inventory of recognition units, and the corresponding CTSUs are then obtained. The procedure was tested with a preliminary set of 10 Rumanian speakers starting from an Italian-EnglishSpanish CTSU model. The optimal mapping of the vowel phone set of this language onto the multilingual phone set was obtained by inspecting the F1 and F2 formants of the vowel sounds from two male and female Rumanian speakers, and by comparing them with the values of F1 and F2 of the other three languages. Results in terms of recognition word accuracy measured on a preliminary test set of 10 speakers are reported.
منابع مشابه
Multi-lingual phoneme recognition exploiting acoustic-phonetic similarities of sounds
The aim of this work is to exploit the acoustic-phonetic similarities between several languages. In recent work cross{ language HMM-based phoneme models have been used only for bootstrapping the language{dependent models and the multi{lingual approach has been investigated only on very small speech corpora. In this paper, we introduce a statistical distance measure to determine the similarities...
متن کاملPresentation of K Nearest Neighbor Gaussian Interpolation and comparing it with Fuzzy Interpolation in Speech Recognition
Hidden Markov Model is a popular statisical method that is used in continious and discrete speech recognition. The probability density function of observation vectors in each state is estimated with discrete density or continious density modeling. The performance (in correct word recognition rate) of continious density is higher than discrete density HMM, but its computation complexity is very ...
متن کاملPresentation of K Nearest Neighbor Gaussian Interpolation and comparing it with Fuzzy Interpolation in Speech Recognition
Hidden Markov Model is a popular statisical method that is used in continious and discrete speech recognition. The probability density function of observation vectors in each state is estimated with discrete density or continious density modeling. The performance (in correct word recognition rate) of continious density is higher than discrete density HMM, but its computation complexity is very ...
متن کاملCross Lingual Modelling Experiments for Indonesian
The extension of Large Vocabulary Continuous Speech Recognition (LVCSR) to resource poor languages such as Indonesian is hindered by the lack of transcribed acoustic data and appropriate pronunciation lexicons. Research has generally been directed toward establishing robust cross-lingual acoustic models, with the assumption that phonetic lexicons are readily available. This is not the case for ...
متن کاملSequence-based Multi-lingual Low Resource Speech Recognition
Techniques for multi-lingual and cross-lingual speech recognition can help in low resource scenarios, to bootstrap systems and enable analysis of new languages and domains. End-to-end approaches, in particular sequence-based techniques, are attractive because of their simplicity and elegance. While it is possible to integrate traditional multi-lingual bottleneck feature extractors as front-ends...
متن کامل